Search CORE

23 research outputs found

The paradoxical role of emotional intensity in the perception of vocal affect

Author: Holz N.
Larrouy-Maestri P.
Poeppel D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/05/2021
Field of study

Vocalizations including laughter, cries, moans, or screams constitute a potent source of information about the affective states of others. It is typically conjectured that the higher the intensity of the expressed emotion, the better the classification of affective information. However, attempts to map the relation between affective intensity and inferred meaning are controversial. Based on a newly developed stimulus database of carefully validated non-speech expressions ranging across the entire intensity spectrum from low to peak, we show that the intuition is false. Based on three experiments (N = 90), we demonstrate that intensity in fact has a paradoxical role. Participants were asked to rate and classify the authenticity, intensity and emotion, as well as valence and arousal of the wide range of vocalizations. Listeners are clearly able to infer expressed intensity and arousal; in contrast, and surprisingly, emotion category and valence have a perceptual sweet spot: moderate and strong emotions are clearly categorized, but peak emotions are maximally ambiguous. This finding, which converges with related observations from visual experiments, raises interesting theoretical challenges for the emotion communication literature

MPG.PuRe

Perception of Nigerian Dùndún talking drum performances as speech-like vs. music-like: The role of familiarity and acoustic cues

Author: Durojaye C.
Fink L.
Larrouy-Maestri P.
Roeske T.
Wald-Fuhrmann M.
Publication venue: 'Frontiers Media SA'
Publication date: 20/05/2021
Field of study

It seems trivial to identify sound sequences as music or speech, particularly when the sequences come from different sound sources, such as an orchestra and a human voice. Can we also easily distinguish these categories when the sequence comes from the same sound source? On the basis of which acoustic features? We investigated these questions by examining listeners’ classification of sound sequences performed by an instrument intertwining both speech and music: the dùndún talking drum. The dùndún is commonly used in south-west Nigeria as a musical instrument but is also perfectly fit for linguistic usage in what has been described as speech surrogates in Africa. One hundred seven participants from diverse geographical locations (15 different mother tongues represented) took part in an online experiment. Fifty-one participants reported being familiar with the dùndún talking drum, 55% of those being speakers of Yorùbá. During the experiment, participants listened to 30 dùndún samples of about 7s long, performed either as music or Yorùbá speech surrogate (n = 15 each) by a professional musician, and were asked to classify each sample as music or speech-like. The classification task revealed the ability of the listeners to identify the samples as intended by the performer, particularly when they were familiar with the dùndún, though even unfamiliar participants performed above chance. A logistic regression predicting participants’ classification of the samples from several acoustic features confirmed the perceptual relevance of intensity, pitch, timbre, and timing measures and their interaction with listener familiarity. In all, this study provides empirical evidence supporting the discriminating role of acoustic features and the modulatory role of familiarity in teasing apart speech and music

PubMed Central

MPG.PuRe

The Dùndún Drum helps us understand how we process speech and music

Author: Durojaye C.
Fink L.
Larrouy-Maestri P.
Roeske T.
Wald-Fuhrmann M.
Publication venue: 'Frontiers Media SA'
Publication date: 26/10/2022
Field of study

Every day, you hear many sounds in your environment, like speech, music, animal calls, or passing cars. How do you tease apart these unique categories of sounds? We aimed to understand more about how people distinguish speech and music by using an instrument that can both “speak” and play music: the dùndún talking drum. We were interested in whether people could tell if the sound produced by the drum was speech or music. People who were familiar with the dùndún were good at the task, but so were those who had never heard the dùndún, suggesting that there are general characteristics of sound that define speech and music categories. We observed that music is faster, more regular, and more variable in volume than “speech.” This research helps us understand the interesting instrument that is dùndún and provides insights about how humans distinguish two important types of sound: speech and music

MPG.PuRe

Exploring emotional prototypes in a high dimensional TTS latent space

Author: André E
Harrison PMC
Jacoby N
Larrouy-Maestri P
Mertes S
Schiller D
van Rijn P
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2021
Field of study

Recent TTS systems are able to generate prosodically varied and realistic speech. However, it is unclear how this prosodic variation contributes to the perception of speakers’ emotional states. Here we use the recent psychological paradigm ‘Gibbs Sampling with People’ to search the prosodic latent space in a trained Global Style Token Tacotron model to explore prototypes of emotional prosody. Participants are recruited online and collectively manipulate the latent space of the generative speech model in a sequentially adaptive way so that the stimulus presented to one group of participants is determined by the response of the previous groups. We demonstrate that (1) particular regions of the model’s latent space are reliably associated with particular emotions, (2) the resulting emotional prototypes are well-recognized by a separate group of human raters, and (3) these emotional prototypes can be effectively transferred to new sentences. Collectively, these experiments demonstrate a novel approach to the understanding of emotional speech by providing a tool to explore the relation between the latent space of generative models and human semantics

OPUS Augsburg

Apollo (Cambridge)

MPG.PuRe

Gibbs sampling with people

Author: Adolfi F.
Anglada-Tort M.
Harrison P.
Jacoby N.
Larrouy-Maestri P.
Marjieh R.
Tchernichovski O.
van Rijn P.
Publication venue
Publication date: 01/01/2021
Field of study

A core problem in cognitive science and machine learning is to understand how humans derive semantic representations from perceptual objects, such as color from an apple, pleasantness from a musical chord, or seriousness from a face. Markov Chain Monte Carlo with People (MCMCP) is a prominent method for studying such representations, in which participants are presented with binary choice trials constructed such that the decisions follow a Markov Chain Monte Carlo acceptance rule. However, while MCMCP has strong asymptotic properties, its binary choice paradigm generates relatively little information per trial, and its local proposal function makes it slow to explore the parameter space and find the modes of the distribution. Here we therefore generalize MCMCP to a continuous-sampling paradigm, where in each iteration the participant uses a slider to continuously manipulate a single stimulus dimension to optimize a given criterion such as 'pleasantness'. We formulate both methods from a utility-theory perspective, and show that the new method can be interpreted as 'Gibbs Sampling with People' (GSP). Further, we introduce an aggregation parameter to the transition step, and show that this parameter can be manipulated to flexibly shift between Gibbs sampling and deterministic optimization. In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces. We validate these results through large-scale perceptual rating experiments. The final experiments use GSP to navigate the latent space of a state-of-the-art image synthesis network (StyleGAN), a promising approach for applying GSP to high-dimensional perceptual spaces. We conclude by discussing future cognitive applications and ethical implications

MPG.PuRe

Gibbs sampling with people

Author: Adolfi F
Anglada-Tort M
Harrison PMC
Jacoby N
Larrouy-Maestri P
Marjieh R
Tchernichovski O
van Rijn P
Publication venue: Advances in Neural Information Processing Systems
Publication date: 06/08/2020
Field of study

arXiv.org e-Print Archive

Apollo (Cambridge)

The Timbre Perception Test (TPT): A new interactive musical assessment tool to measure timbre perception ability

Author: A Caclin
American National Standards Institute
AR Halpern
BCJ Moore
C Suied
CW Turner
D Müllensiefen
DL Wessel
DM Green
DM Green
DM Green
E Schubert
F Faul
F Liu
G Kidd
G Oster
G Peeters
GR Kidd
I Peretz
JL Golubock
JM Cortina
JM Grey
K Siedenburg
L Stewart
M Sadakata
M Wallentin
N Jacoby
O Lartillot
P Heaton
P Larrouy-Maestri
PN Vassilakis
R Plomp
R Plomp
S Handel
S Lakatos
S McAdams
S McAdams
S McAdams
TR Agus
W De Baene
Y Shen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/09/2020
Field of study

To date, tests that measure individual differences in the ability to perceive musical timbre are scarce in the published literature.The lack of such tool limits research on how timbre, a primary attribute of sound, is perceived and processed among individuals.The current paper describes the development of the Timbre Perception Test (TPT), in which participants use a slider to reproduce heard auditory stimuli that vary along three important dimensions of timbre: envelope, spectral flux, and spectral centroid. With a sample of 95 participants, the TPT was calibrated and validated against measures of related abilities and examined for its reliability. The results indicate that a short-version (8 minutes) of the TPT has good explanatory support from a factor analysis model, acceptable internal reliability (α=.69,ωt = .70), good test–retest reliability (r= .79) and substantial correlations with self-reported general musical sophistication (ρ= .63) and pitch discrimination (ρ= .56), as well as somewhat lower correlations with duration discrimination (ρ= .27), and musical instrument discrimination abilities (ρ= .33). Overall, the TPT represents a robust tool to measure an individual’s timbre perception ability. Furthermore, the use of sliders to perform a reproductive task has shown to be an effective approach in threshold testing. The current version of the TPT is openly available for research purposes

Goldsmiths Research Online

Crossref

Justesse en voix chantée: Bien évaluer pour bien guider

Author: Larrouy-Maestri P.
Publication venue
Publication date: 01/01/2016
Field of study

MPG.PuRe

The influence of non-musical variables on the evaluation of vocal pitch accuracy

Author: Larrouy-Maestri P.
Publication venue
Publication date: 01/01/2017
Field of study

MPG.PuRe

Evaluation tools in singing education: A comparison of human and technological measures

Author: Larrouy-Maestri P.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2020
Field of study

Two methods are currently used to determine vocal performance intonation: human listeners and computer tools. Theoretically, evaluation relies on musical criteria and the more that performers sing in accordance to these criteria, the more accurate they are judged to be. In practice, evaluating music performances has to take into account several elements such as the type of performance, the musician, and the judge performing the evaluation. By discussing the advantages and limits of the two current methods in the general framework of evaluation of intonation, this chapter aims to provide practical information to make informed decisions when evaluating singing performances

Crossref

MPG.PuRe